Performance database: capturing data for optimizing distributed streaming workflows.

نویسندگان

Chee Sun Liew

Malcolm P Atkinson

Radoslaw Ostrowski

Murray Cole

Jano I van Hemert

Liangxiu Han

چکیده

The performance database (PDB) stores performance-related data gathered during workflow enactment. We argue that, by carefully understanding and manipulating these data, we can improve efficiency when enacting workflows. This paper describes the rationale behind the PDB, and proposes a systematic way to implement it. The prototype is built as part of the Advanced Data Mining and Integration Research for Europe project. We use workflows from real-world experiments to demonstrate the usage of PDB.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Database: Capturing Data for Optimising Distributed Streaming Workflows

It is evident that data-intensive research is transforming the computing landscape, as recognised in “The Fourth Paradigm” [1]. Due to the scale, complexity and heterogeneity of data gathered in scientific experiments, we can not naively dumping the data into computing resources and hoping to extract useful information and knowledge through exhaustive and unstructured computations. To survive t...

متن کامل

A Compiler Toolchain for Distributed Data Intensive Scientific Workflows

by Peter Bui With the growing amount of computational resources available to researchers today and the explosion of scientific data in modern research, it is imperative that scientists be able to construct data processing applications that harness these vast computing systems. To address this need, I propose applying concepts from traditional compilers, linkers, and profilers to the constructio...

متن کامل

Complexity Analysis and Performance Optimization of Distributed Computing Workflows: From Theory to Practice

The advance of supercomputing technology is expediting the transition in various basic and applied sciences from traditional laboratory-controlled experimental methodologies to modern computational paradigms involving complex numerical model analyses and extreme-scale simulations. These computationbased simulations and analyses have become an essential research and discovery tool in next-genera...

متن کامل

Optimizing Query Processing in Batch Streaming System

With the growing need of processing “big data” in real time, modern streaming processing systems should be able to operate at the cloud scale. This imposes challenges to building large scale stream processing systems. First, processing tasks should be efficiently distributed to worker nodes with small overhead. Second, streaming data processing should be highly available, despite that failures ...

متن کامل

A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows

Integrated provenance support promises to be a chief advantage of scientific workflow systems over script-based alternatives. While it is often recognized that information gathered during scientific workflow execution can be used automatically to increase fault tolerance (via checkpointing) and to optimize performance (by reusing intermediate data products in future runs), it is perhaps more si...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences

دوره 369 1949 شماره

صفحات -

تاریخ انتشار 2011

Performance database: capturing data for optimizing distributed streaming workflows.

نویسندگان

چکیده

منابع مشابه

Performance Database: Capturing Data for Optimising Distributed Streaming Workflows

A Compiler Toolchain for Distributed Data Intensive Scientific Workflows

Complexity Analysis and Performance Optimization of Distributed Computing Workflows: From Theory to Practice

Optimizing Query Processing in Batch Streaming System

A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows

عنوان ژورنال:

اشتراک گذاری